355 research outputs found
PasMoQAP: A Parallel Asynchronous Memetic Algorithm for solving the Multi-Objective Quadratic Assignment Problem
Multi-Objective Optimization Problems (MOPs) have attracted growing attention
during the last decades. Multi-Objective Evolutionary Algorithms (MOEAs) have
been extensively used to address MOPs because are able to approximate a set of
non-dominated high-quality solutions. The Multi-Objective Quadratic Assignment
Problem (mQAP) is a MOP. The mQAP is a generalization of the classical QAP
which has been extensively studied, and used in several real-life applications.
The mQAP is defined as having as input several flows between the facilities
which generate multiple cost functions that must be optimized simultaneously.
In this study, we propose PasMoQAP, a parallel asynchronous memetic algorithm
to solve the Multi-Objective Quadratic Assignment Problem. PasMoQAP is based on
an island model that structures the population by creating sub-populations. The
memetic algorithm on each island individually evolve a reduced population of
solutions, and they asynchronously cooperate by sending selected solutions to
the neighboring islands. The experimental results show that our approach
significatively outperforms all the island-based variants of the
multi-objective evolutionary algorithm NSGA-II. We show that PasMoQAP is a
suitable alternative to solve the Multi-Objective Quadratic Assignment Problem.Comment: 8 pages, 3 figures, 2 tables. Accepted at Conference on Evolutionary
Computation 2017 (CEC 2017
Entropy-based High Performance Computation of Boolean SNP-SNP Interactions Using GPUs
It is being increasingly accepted that traditional statistical Single
Nucleotide Polymorphism (SNP) analysis of Genome-Wide Association
Studies (GWAS) reveals just a small part of the heritability in complex
diseases. Study of SNPs interactions identify additional SNPs that contribute
to disease but that do not reach genome-wide significance or exhibit only epistatic
effects. We have introduced a methodology for genome-wide screening
of epistatic interactions which is feasible to be handled by state-of-art
high performance computing technology. Unlike standard software,
our method computes all boolean binary interactions between SNPs across
the whole genome without assuming a particular model of interaction.
Our extensive search for epistasis comes at the expense of higher computational
complexity, which we tackled using graphics processors (GPUs) to reduce the
computational time from several months in a cluster of CPUs to 3-4 days on a
multi-GPU platform. Here, we contribute with a new
entropy-based function to evaluate the interaction between SNPs
which does not compromise findings about the most significant SNP
interactions, but is more than 4000 times lighter in terms of computational time
when running on GPUs and provides more than 100x faster code than a CPU of similar cost.
We deploy a number of optimization techniques to tune the implementation of
this function using CUDA and show the way to enhance scalability on larger data sets.Universidad de Málaga. Campus de Excelencia Internacional AndalucÃa Tech. This work was also supported by the Australian Research Council Future Fellowship
to Prof. Moscato, by a funded grant from the ARC Discovery Project Scheme
and by the Ministry of Education of Spain under Project TIN2006-01078 and
mobility grant PR2011-0144. We also thank NVIDIA for hardware donation under
CUDA Teaching and Research Center awards
Graph algorithms for machine learning: a case-control study based on prostate cancer populations and high throughput transcriptomic data
Background
The continuing proliferation of high-throughput biological data promises to revolutionize personalized medicine. Confirming the presence or absence of disease is an important goal. In this study, we seek to identify genes, gene products and biological pathways that are crucial to human health, with prostate cancer chosen as the target disease.
Materials and methods
Using case-control transcriptomic data, we devise a graph theoretical toolkit for this task. It employs both innovative algorithms and novel two-way correlations to pinpoint putative biomarkers that classify unknown samples as cancerous or normal.
Results and conclusion
Observed accuracy on real data suggests that we are able to achieve sensitivity of 92% and specificity of 91%
Quantifying the regeneration of bone tissue in biomedical images via Legendre moments
ArtÃculo publicado en los proceedings del congresoWe investigate the use of Legendre moments as biomarkers for an efficient and accurate
classification of bone tissue on images coming from stem cell regeneration studies.
Regions of either existing bone, cartilage or new bone-forming cells are
characterized at tile level to quantify the degree of bone regeneration
depending on culture conditions. Legendre moments are analyzed
from three different perspectives:
(1) their discriminant properties in a wide set of preselected vectors of
features based on our clinical and computational experience, providing solutions
whose accuracy exceeds 90%.
(2) the amount of information to be retained when using Principal Component
Analysis (PCA) to reduce the dimensionality of the problem from 2 to 6 dimensions.
(3) the use of the (alpha-beta)-k-feature set problem to identify a k=4 number of
features which are more relevant to our analysis from a combinatorial optimization approach.
These techniques are compared in terms of computational complexity
and classification accuracy to assess the strengths and limitations of the use
of Legendre moments for this biomedical image processing application.Universidad de Málaga, Campus de Excelencia Internacional AndalucÃa Tech
Hierarchical Clustering Using the Arithmetic-Harmonic Cut: Complexity and Experiments
Clustering, particularly hierarchical clustering, is an important method for understanding and analysing data across a wide variety of knowledge domains with notable utility in systems where the data can be classified in an evolutionary context. This paper introduces a new hierarchical clustering problem defined by a novel objective function we call the arithmetic-harmonic cut. We show that the problem of finding such a cut is -hard and -hard but is fixed-parameter tractable, which indicates that although the problem is unlikely to have a polynomial time algorithm (even for approximation), exact parameterized and local search based techniques may produce workable algorithms. To this end, we implement a memetic algorithm for the problem and demonstrate the effectiveness of the arithmetic-harmonic cut on a number of datasets including a cancer type dataset and a corona virus dataset. We show favorable performance compared to currently used hierarchical clustering techniques such as -Means, Graclus and Normalized-Cut. The arithmetic-harmonic cut metric overcoming difficulties other hierarchal methods have in representing both intercluster differences and intracluster similarities
Uncovering Molecular Biomarkers That Correlate Cognitive Decline with the Changes of Hippocampus' Gene Expression Profiles in Alzheimer's Disease
Background: Alzheimer’s disease (AD) is characterized by a neurodegenerative progression that alters cognition. On a phenotypical level, cognition is evaluated by means of the MiniMental State Examination (MMSE) and the post-morten examination of Neurofibrillary Tangle count (NFT) helps to confirm an AD diagnostic. The MMSE evaluates different aspects of cognition including orientation, short-term memory (retention and recall), attention and language. As there is a normal cognitive decline with aging, and death is the final state on which NFT can be counted, the identification of brain gene expression biomarkers from these phenotypical measures has been elusive. Methodology/Principal Findings: We have reanalysed a microarray dataset contributed in 2004 by Blalock et al. of 31 samples corresponding to hippocampus gene expression from 22 AD subjects of varying degree of severity and 9 controls. Instead of only relying on correlations of gene expression with the associated MMSE and NFT measures, and by using modern bioinformatics methods based on information theory and combinatorial optimization, we uncovered a 1,372-probe gene expression signature that presents a high-consensus with established markers of progression in AD. The signature reveals alterations in calcium, insulin, phosphatidylinositol and wnt-signalling. Among the most correlated gene probes with AD severity we found those linked to synaptic function, neurofilament bundle assembly and neuronal plasticity. Conclusions/Significance: A transcription factors analysis of 1,372-probe signature reveals significant associations with the EGR/KROX family of proteins, MAZ, and E2F1. The gene homologous of EGR1, zif268, Egr-1 or Zenk, together with other members of the EGR family, are consolidating a key role in the neuronal plasticity in the brain. These results indicate a degree of commonality between putative genes involved in AD and prion-induced neurodegenerative processes that warrants further investigation
A Kernelisation Approach for Multiple d-Hitting Set and Its Application in Optimal Multi-Drug Therapeutic Combinations
Therapies consisting of a combination of agents are an attractive proposition,
especially in the context of diseases such as cancer, which can manifest with a
variety of tumor types in a single case. However uncovering usable drug
combinations is expensive both financially and temporally. By employing
computational methods to identify candidate combinations with a greater
likelihood of success we can avoid these problems, even when the amount of data
is prohibitively large. Hitting Set is a combinatorial problem
that has useful application across many fields, however as it is
NP-complete it is traditionally considered hard to solve
exactly. We introduce a more general version of the problem
(α,β,d)-Hitting Set,
which allows more precise control over how and what the hitting set targets.
Employing the framework of Parameterized Complexity we show that despite being
NP-complete, the
(α,β,d)-Hitting Set
problem is fixed-parameter tractable with a kernel of size O(αdkd) when we parameterize by the size k of the
hitting set and the maximum number α of the minimum number of hits,
and taking the maximum degree d of the target sets as a
constant. We demonstrate the application of this problem to multiple drug
selection for cancer therapy, showing the flexibility of the problem in
tailoring such drug sets. The fixed-parameter tractability result indicates that
for low values of the parameters the problem can be solved quickly using exact
methods. We also demonstrate that the problem is indeed practical, with
computation times on the order of 5 seconds, as compared to previous Hitting Set
applications using the same dataset which exhibited times on the order of 1 day,
even with relatively relaxed notions for what constitutes a low value for the
parameters. Furthermore the existence of a kernelization for
(α,β,d)-Hitting Set
indicates that the problem is readily scalable to large datasets
Iteratively refining breast cancer intrinsic subtypes in the METABRIC dataset
BACKGROUND: Multi-gene lists and single sample predictor models have been currently used to reduce the multidimensional complexity of breast cancers, and to identify intrinsic subtypes. The perceived inability of some models to deal with the challenges of processing high-dimensional data, however, limits the accurate characterisation of these subtypes. Towards the development of robust strategies, we designed an iterative approach to consistently discriminate intrinsic subtypes and improve class prediction in the METABRIC dataset. FINDINGS: In this study, we employed the CM1 score to identify the most discriminative probes for each group, and an ensemble learning technique to assess the ability of these probes on assigning subtype labels using 24 different classifiers. Our analysis is comprised of an iterative computation of these methods and statistical measures performed on a set of over 2000 samples. The refined labels assigned using this iterative approach revealed to be more consistent and in better agreement with clinicopathological markers and patients' overall survival than those originally provided by the PAM50 method. CONCLUSIONS: The assignment of intrinsic subtypes has a significant impact in translational research for both understanding and managing breast cancer. The refined labelling, therefore, provides more accurate and reliable information by improving the source of fundamental science prior to clinical applications in medicine
Learning to extrapolate using continued fractions: Predicting the critical temperature of superconductor materials
In Artificial Intelligence we often seek to identify an unknown target
function of many variables giving a limited set of instances
with where is a
domain of interest. We refer to as the training set and the final quest is
to identify the mathematical model that approximates this target function for
new ; with the set with (i.e. thus testing the model generalisation). However, for some
applications, the main interest is approximating well the unknown function on a
larger domain that contains . In cases involving the design of new
structures, for instance, we may be interested in maximizing ; thus, the
model derived from alone should also generalize well in for samples
with values of larger than the largest observed in . In that sense, the
AI system would provide important information that could guide the design
process, e.g., using the learned model as a surrogate function to design new
lab experiments.
We introduce a method for multivariate regression based on iterative fitting
of a continued fraction by incorporating additive spline models. We compared it
with established methods such as AdaBoost, Kernel Ridge, Linear Regression,
Lasso Lars, Linear Support Vector Regression, Multi-Layer Perceptrons, Random
Forests, Stochastic Gradient Descent and XGBoost. We tested the performance on
the important problem of predicting the critical temperature of superconductors
based on physical-chemical characteristics.Comment: Submitted to IEEE Transactions on Artificial Intelligence (TAI
- …